Note: page count was determined by all the content is 10 pages total, when all tables & graphs are scaled to 10cm x 7cm size

Introduction

The Auckland international airfield (AUC) is planning on expanding their airfield and becoming the central flight hub of Borealia. Due to its central location, AUC aims to increase its flight frequency by increasing airport gate size and offering more connecting flights to the surrounding countries. This is done to increase the amount of passengers visiting AUC. The higher passenger amounts increase airfield revenue via passenger purchases (from airfield owned stores) and higher leasing fees (to stores renting space at the airfield). From information given by AUC corporate, it has been determined that the average amount a passenger spends in an airport is a function of spent time in the airport, but if a passenger spends too much time at the airport, their spending does not increase but the store capacities do (think about a person at a restaurant, who spends 5 extra hours at their table after paying their bill). Therefore it is important to make sure that the flow of passengers does not halt (due to a lot of missed flights or a backlog of security check queues). We are being tasked with investigating the departing flight patterns, passenger time spent in the airport.

Main Objective & Goals

Our main goals are to firstly, explore flight delays, and the amount of wasted time passengers have endured due to this. Secondly, we aim to conduct queuing analysis on the security checking stations and propose a formula that optimizes the amount of active security check personnel given the amount of queuing traffic in the security checks.

Datasets

For our analysis, out of the four datasets; “dat_P_sub_c, dat_F_sub, BASA_AUC_912, years20262030”, we decided to use dat_P_sub_c and dat_F_sub. This is due to the fact that the information in the “years” dataset spans for one week, which is not enough to make any relevant conclusions. Furthermore, the “BASA” dataset consists of information that seems to be identical to the information in dat_P and dat_f, however there are variables which we are unable to interpret (such as arrival time).

variable name description data type measurement unit accepted values
Dataset: dat_P_sub_c
Act_Departure actual departure time of passenger’s flight date ordinal from 2028-09-01 to 2028-12-31
C0 number of servers open at the end of the queue int numerical 0-3
Delay_In_Seconds amount of time passenger’s flight was delayed in seconds int numerical -1440 - 100200
Flight_ID the ID for a specific flight that the passenger is taking int categorical 18095 - 21678
S2 customer arrival time at server Date ordinal from 2028-09-01 10:04 to 2028-12-31 9:54
Wait_Time time in minutes spent in queue int numerical 1-75
Dataset: dat_f_sub
BFO_Destination_Country_Code country code where the flight arrives at String categorical AUC, BOR, NEN, SCO, VES, WIC
Flight_ID the ID for the specific flight a plane is taking int categorical 18095 - 21678
Delay_In_Seconds amount of time a flight was delayed in seconds int numerical -1440 - 100200

Delayed Flight Analysis

Exploration of Flights

Fig 1.0 shows the flow of flights leaving from AUC (left node) and arriving at their destinations (the middle nodes). The flow to the right categorizes the flight status (early, on time, late 30 minutes or more). From this diagram, we can deduce that the ratio of late flights per destination airport are as follows: BOR = 12.7%, VES = 17.5%, NEN = 21.2% and SCO = 24.1%.

Although there might be multiple reasons why a flight is delayed, such as the plane arriving late to the airfield, mechanical issues, unpredictable weather, there are ways for the airfield to mitigate these delays.

One possible solution is to make sure that the waiting queues at the security checkpoints are managed by an appropriate number of servers (this will be discussed in queuing analysis). The other possible solution is for AUC to update their flight schedules, so if the scheduled departure is at 10:00am, but on average, the flight is delayed by 30 minutes, we would schedule the departure for 10:30am for the future flights that use the same route.

Furthermore, we can see that the total number of flights for NEN is 193 and for SCO 54 respectively. This might be something of interest for AUC to conduct a cost-benefit analysis to determine if these flight routes are worth upholding. Another variable affecting flight delays is seasonality.

Exploration of Passengers

Effects of the aforementioned seasonality can be observed in fig 1.1, which breaks down each month by the day of the week and the time of day for each day.

It seems that the number of delayed passengers increases by month. October delays however, are higher compared to November delays. One possible explanation for this is the fact that there is an increase in the amount of passengers due to thanksgiving. It should also be noted that December has by far the largest amount of passengers with delayed flights. The cause of this could be two-fold. The first reason being that December is a holiday season (christmas & new years), which increases flight traffic. Secondly, weather conditions tend to be the harshest around December.

Lastly, fig 1.1 displays the fact that predominantly, the time of day with the most delays is during the evening, which can be explained by the fact that the last flight leaves at 22:00. This might be due to the fact that a delay in the morning leads to further delays throughout the day.

Queueing Analysis

Queueing Wait Times

This visualization shows us the average waiting times for AUC over a span of 4 months. By looking at the time series plot, it would seem that analyzing security check queues would be easy. However, we have various other factors to look into such as arrival rates, the average number of servers, the estimated service rate etc.

Queue Modeling

The data we are using in our analysis consists of 17 weeks of passenger flight information with which we split the weekdays and weekends, since the influx of passengers in the airport is significantly different. Also, we decided to split these clusters into four hour intervals from 6:00 to 22:00 to study the difference of traffic between the morning, afternoon, evening and night periods. The first step for our queuing model is to obtain an arrival rate for each cluster, which we have tallied in this table:

Image Description

We obtained the average arrival rate per minute by dividing the number of arrivals (count) by the number of minutes in the cluster. An interesting observation, it seems that the arrival rate is higher in the morning and lower at night.

Now let’s look at the number of servers that were opened for each time period. We need to keep in mind that, during each time period, the number of servers at each checkpoint can be adjusted at any moment (employee breaks, shift changes, queue lengths). However, the recorded number of servers that we used was recorded every 15 mins which decreases discrepancies between the actual numbers and the reported number.

Image Description

The average number of servers is not matching the tendency that we noticed stating that the influx is higher in the morning. During the weekend evenings, the proportion of 3 servers being open is reaching an all time high of 6.6% even though it relates to the cluster with the lower arrival rate. Therefore, it would probably be more efficient to open less servers in the evening and more in the middle of the day but a possible explanation behind these values not matching would be that the airport is closing at 22:00 and they need to serve all the customers before closing the airfield without risking going beyond business hours.

This leads us to study another concept being the performance of the servers which is given to us by looking at the proportion of people that waited less than a certain amount of time:

Image Description

First of all, the count column of this table is showing a different number from the count in table 1. This is because we decided to remove all the people that had no wait times (time spent between S1 and S2) for this part of the analysis. Indeed, if they didn’t wait, then there is no point including them in the performance computation because they are not giving us insights about the servers performance, they probably skipped the control at S1 and S2. These people were either staff or the main queue was empty so they have nothing to do with the number of servers being opened and including these values would bias the final result of the performance.

Secondly, whether it is for the weekday or the weekend, the clusters showing a higher average wait time are the morning clusters from 6:00 to 10:00 even though they already have a large number of servers open. During the week, we even notice that 0.06% of people are waiting more than 30 minutes which may tell us that the number of servers is not sufficient.

Now when looking at the M/M/1 queueing model, we can estimate the probability of waiting up to x units of time using the arrival rate, and the service rate.

Image Description

We obtained the estimated service rate using the arrival rate which is already known and with an average wait time. The estimated ρ is the traffic intensity of the queuing system, and if this value exceeds 1, it is not relevant to compute the performance levels which are giving us the quality of service. We used the M/M/1 queuing model to validate our raw data, and we can notice that the performance is matching the previous table.

Let’s make a summary of all the information we got and make a table out of it.

Image Description

This gives us now the arrival rate per server along with the service rate per server which are going to be the variables used in our regression as we can see in the following plot:

The values fall into our regression line which means that we have a linear relation between the Arrival rate per line and the service rate per line.

Using our regression now, we want to confirm the previous values obtained giving us the following tables:

Image Description

We obtained the updated service rate along with our regression using the equation 𝜇 = ac+ b𝜆 and the regression ρ by dividing the arrival rate by the regression service rate.

We now have new estimated performance using the regression which is going to help us find the predicted mean number of servers and compare it with the actual number of servers.

Image Description

We calculated the predicted mean number of servers by using y = (ac + b𝜆)x which is equal to the Lambert function. By finding y, we can further compute c giving us the predicted mean number of servers. This table is confirming our initial thoughts as we can notice that the predicted mean number of servers is higher than the actual number of servers for the morning cluster. This means that it would be clever to add a fourth server in the morning or keep at least three servers for a moment to obtain a mean of 1.74 servers. Another efficient choice would be to reduce the number of servers in the afternoon and then add some in the evening even if the arrival rate is low. It appears that the actual number of servers is not sufficient as there is a high number of predicted servers at certain clusters implying that they should employ/set more servers as it is predicted to be busier at those clusters.

Image Description

We notice that the predicted mean number of servers = 1.0136*(actual number of servers) where 1.0136 is the regression estimate of the checkpoint departure d. Computed values of d near 1 for nearly all checkpoints further validate the combined model.

Summary & Conclusion

After investigating the departing flight patterns and passenger time spent in the airport, if AUC increases its flight frequency, our best advice would be to mitigate their current flight delays and make sure they follow our queueing model to not further enlarge queueing times, which can correlate with those delays.

Our model has led us to discover that it would be more efficient for the airfield to open more servers in the morning because the number of servers were not sufficient to have all the people served in less than 30 minutes since this is the moment of the day where the airfield is the most crowded.

Another way of dealing with the wait times is to reduce the number of servers open in the afternoon and add some in the last hours because before the closing of the airfield, we notice an increase of delays and people being not served in less than 30 minutes. Sometimes instead of attributing more employees to a task we might just reassign their schedules throughout the day and come up with a more efficient system.

For further steps, ASIM analytiq suggests that AUC corporate focuses on flight route optimization, scheduling and proper design of the airport terminals in order to minimize future flight delays.

Contributions

Data dictionary: Musembi Nzau, Ilias Fousi
Narrative & story: Musembi Nzau, Ilias Fousi
Document formatting: Musembi Nzau
Interactive visualizations: Musembi Nzau
Editing: Ayaan Arora, Samantha Senecal
Tables: Ayaan Arora, Samantha Senecal, Ilias Fousi
Regression plots: Samantha Senecal
Queueing calculations: Ayaan Arora, Samantha Senecal, Ilias Fousi